Abstract

Forecasts are important for making business decisions such as knowing how much to produce. For many aircraft manufacturers, even a 1% increase in forecasting accuracy means an increase of millions of dollars in revenue. In this project, two models are developed as a step stone to predict the number of delivered aircrafts in the future. The first model uses time series to predict the yearly new aircrafts export values. The second one applies macro economic variables to predict the yearly carried passenger for each country by using multi level mixed effect model.

Introduction

Background

By any measure, the commercial aviation sector is soaring. More people are taking to the air than ever before, as aviation industry has now recorded eight straight years of steady and above-trend growth. In the plot below, we can see the number of passengers carried in a certain year grew faster and faster, especially after 2010.

Number of passengers carried by aircrafts https://tradingeconomics.com/world/air-transport-passengers-carried-wb-data.html

In the past few months, however, there was a earthshaking trade war between China and United States. Among with many other counterattack policy published, one of them was a 25 percent tariff on U.S. civil aircraft with an empty weight of 15,000-45,000 kilograms, which is targeting below the Boeing model line. This is more like a warning shot to the U.S. Administration to proceed no further but it stirred some concerns in aviation industry.

As a former commercial pilot. I am personally interested in finding out if there is any potential impact of this tariff by using time series prediction. It is important to keep in mind that aircraft delivery process takes very long cycle. For example, an order for four aircrafts can take several years before the last aircraft is delivered. Therefore, it is relatively not easy to draw a conclusion on the impact of tariff given the time span between the Chinese government published its policy and now(only a few months passed).

After discussing time series, focus will be shifted to multilevel prediction model on number of passengers carried each year for an individual country. To keep this project relatively succinct, I choose not to predict the number of aircrafts delivered each year but focusing on the number of passengers each year. This is because predicting the number of aircrafts delivered entails factors like the size of aircraft, retirements, cancellation of order etc. In this graph, there is a dynamic relationship between state of the economy and airline industry.

Fig 1

Fig 1

Previous Work

Since Airbus and Boeing company keep their predicting models as a commercial secret, our only available res ources are from previous research done on forecasting commercial airline demand. For example, Jacobson(1970) used a linear regression model to predict trips starting from an airport through two independent variables: average income and airfare. He prediction has an R squared value of 0.82. Another example was from Haney who used some socioeconomic variables to represent the city surrounding the airport. He used population, total personal income, fares, distance, time, highway miles, passenger originations.

Method and materials(Input variables)

Since we are only interested in the traveling demand of a certain country. We will look at the following variables: GDP-worldwide per capita, GDP growth rate, interest rate, inflation rate, jet fuel price, crude oil price, passengers carried in last two years and coastline length. Besides of these variables, I would also use group indicators for different income group countries from high income to low income according to world bank website. Another group indicator is region which describes the geographical position of the country. One of interesting predictor variables is the coastline length in the model. Generally, a country with longer coastline length has bigger land areas. Therefore, people are more likely to take commercial aircraft since other transportation could be slow or inconvenient.

The data collected are from these online websites, U.S. Census Bureau, World Bank open Data , U.S. Energy Information, and Central Intelligence Agency.

Results for time series prediction

Model choice

Let’s take a look at the export aircraft values of top 5 countries starting January 2004. They are France, United Kingdom, Canada, Brazil, China. We can see an overall increase of export aircraft values for these countries as well as some seasonality in aviation industry.

In the next step, we take a closer look at the export aircraft value to China. I have made an interactive plot so that customers can play with it.

Interpretation

It looks pretty chaos, so I’d like to use time series analysis to decompose. Time series analysis is a statistical technique that deals with time series data, or trend analysis. Time series data means that data is in a series of particular time periods or intervals. Time series forecasting is the use of a model to predict future values based on previously observed values.

There are four parts in time series analysis: data, remainder, seasonal and trend. This picture to explains the relationship between four parts. Time series

From the graph below, we can see that the remainder fluctuates around 0 value and the aircraft export value trend increases from 2005 to 2018. There is a slightly drop of trend around 2008. This drop coincides with the financial crisis of 2007-2008.

Model checking

The plot of residuals from the ARIMA(1,1,2) model shows that all autocorrelations are within the threshold limits(dotted blue lines). This indicates the residuals are behaving like white noise. We also know that some drops around 2003 and 2008 are caused by two economic crisis.

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(1,1,2)(2,0,0)[12] with drift
## Q* = 19.052, df = 18, p-value = 0.3887
## 
## Model df: 6.   Total lags used: 24

We are interested in forecast the same period from 2004 January to 2018 September by time-series using auto.arima function. As we can see the red line is the prediction whereas the black line are actual values of aircraft exporting to China. In general, our prediction matches with the trend of export aircraft values. There are several areas that black lines plummet down such as Jan 2016 and Jan 2017. With some calculation, I found out that there were only 12 aircrafts difference between the gap of December 2016 and Jan 2016.

In the plot below, we can see the prediction of value of aircrafts exporting to China in the next 12 months. The grey area is the 95% confidence interval area.

Result for linear model

Model choice

After checking the export values of aircraft, let’s look at the relationship between time and passengers from the plot. In the plot below, we can see that each line represents a country. Most countries experience an increase of carried passengers over the past 30 years. Some low income and lower middle income countries experienced a drop of carried passengers after 2012. With further investigation we can see from the second plot that countries from sub-saharan Africa areas and Europe & Central Asia are among those low income countries whose carried passengers drop a lot.

From these two animations we can see a strong positive relationship between GDP and passengers. In the animation, each point represents a country. Therefore, it is tentative to use linear regression model to predict the passengers. However, it is important to keep in mind that different income groups of countries behave differently.

This is a picture explaining the definition of coastline length.

Definition of coastline length

Definition of coastline length

This plot describes a relationship between coastline length and number of passengers in a certain year. We can also see a positive relationship between coastline length and passengers.

Interpretation

The forecasting model that is of first interest is linear regression model. We will first look at the linear regression with some transformation.

The coefficients that are important to determine the number of passengers are log of GDP, log of previous year passengers and log of 2 years ago passengers. For example, for every 10% increase of GDP, we expect the passengers will increase 2%. This can be calculate by following equations. \[(1+0.1)^{Coefficient}=(1+x)\] \[(1.1)^{0.2} = (1+x)\] \[x=0.019 \] Some other coefficients do not meet our expectation such as mean jet fuel because of its positive sign. We would expect jet fuel price to have a negative relationship with passenger; in other words, the higher the jet fuel price it is, the less likely people will take aircraft because the airfare is usually higher.

Group indicators agree with our expectation. For example, low income country has larger magnitude of coefficient compared to lower middle countries. This means that the poorer a country is, the less likely people in that country will take commercial aircrafts.

## lm(formula = log_passenger ~ log_GDP + sd_GDP_growth + sd_Inflation + 
##     sd_Interest + mean_jet_fuel + log_oil + log_coastline + IncomeGroup + 
##     Region + log_previous_year + log_two_previous_year, data = a)
##                                  coef.est coef.se
## (Intercept)                      -0.55     0.18  
## log_GDP                           0.20     0.01  
## sd_GDP_growth                     0.03     0.01  
## sd_Inflation                      0.00     0.01  
## sd_Interest                      -0.03     0.01  
## mean_jet_fuel                     0.01     0.02  
## log_oil                           0.01     0.03  
## log_coastline                     0.02     0.00  
## IncomeGroupUpper middle income   -0.11     0.03  
## IncomeGroupLower middle income   -0.15     0.03  
## IncomeGroupLow income            -0.19     0.04  
## RegionEurope & Central Asia      -0.24     0.03  
## RegionLatin America & Caribbean  -0.15     0.03  
## RegionMiddle East & North Africa -0.10     0.04  
## RegionNorth America              -0.05     0.06  
## RegionSouth Asia                 -0.11     0.04  
## RegionSub-Saharan Africa         -0.26     0.03  
## log_previous_year                 0.46     0.01  
## log_two_previous_year             0.25     0.01  
## ---
## n = 2581, k = 19
## residual sd = 0.45, R-Squared = 0.96

Model checking

With the first plot we can see that the confidence interval of all the coefficients. If the horizontal solid line crosses the 0 dashed line, it indicates that the coefficient does not have statistical significance. For example the log oil coefficient probably does not have statistical significance.

Next we will look at the residuals plot. In the residual plot, most points spread out evenly on both sides of the 0 line. There are several points such as 7932 observation that is deviated far from the 0 line. After examine the observation, I found out that this is because some of the important values such as the previous year of passengers value is missing.

As we can see from this plot that our prediction value fit and actual log of passengers value together forms a straight line. Two red dashed lines are the boundaries of 95 confidence interval. Some of the real values fall in our prediction range but there are many that fall outside of the prediction range.

Next we will use the mixed effect linear model to predict the number of passengers. Similar to linear regression model, GDP, previous year and two years ago data together are the primary indicators for passenger this year. What is different is that different countries have different intercept(starting value) and slope(growth rate).

## lmer(formula = log_passenger ~ new_year + sd_fuel + log_GDP + 
##     IncomeGroup + log_previous_year + log_two_previous_year + 
##     (1 + new_year | `Country Name`), data = a)
##                       coef.est coef.se
## (Intercept)            0.49     0.33  
## new_year               0.00     0.00  
## sd_fuel                0.00     0.01  
## log_GDP                0.33     0.01  
## IncomeGroup           -0.25     0.04  
## log_previous_year      0.32     0.01  
## log_two_previous_year  0.11     0.01  
## 
## Error terms:
##  Groups       Name        Std.Dev. Corr  
##  Country Name (Intercept) 0.77           
##               new_year    0.02     -0.75 
##  Residual                 0.34           
## ---
## number of obs: 5825, groups: Country Name, 180
## AIC = 5202.7, DIC = 5079.4
## deviance = 5130.0

This plot is used to examine the prediction accuracy of linear mixed effect model. The red line represents the actual value of passengers carried on 2018 and grey lines are the confidence interval range of model predicted value. If the actual value falls out of the predicted range, then it follows a color of orange. We can see that this model is slightly better than the previous one.

Discussion

Implication

This mid term project implemented two methods in predicting the aircraft export values to China and expected passengers for each country in one year. The multiple regression model used the past behavior of macro economic indicators for prediction. It was important to examine the input variables in multiple regression model because of the correlations between input variables. Highly correlated input variables can make prediction less accurate.

Overall, these two models provide sufficient accuracy in prediction. They are easy to use, and user friendly. These models can also be applied in other industries where the commodity follows a long production cycle, such as ship and trains.

Limitation

The limitation of regression model is that it does not cover the prediction of aircraft orders and deliveries, which is the primary concern for aviation industry. Another limitation is some of the input variables such as airfare are not incorporated in the prediction model because such data are not easily accessible through internet. Fitting more variables into the prediction model will greatly improve the accuracy.

Future direction

In the future, the next move would be to build on this model to predict the number of ordered aircrafts.

Reference

Monahan, Kayla M., “Aircraft Demand Forecasting” (2016). Masters Theses. 329. https://scholarworks.umass.edu/masters_theses_2/329

Haney, D. (1975). Review of aviation forecasting methodology. Rep. dot-40176, 6. Boeing (2018) Current Market Outlook 2018-2037

Hans rosling https://www.gapminder.org/videos/gapmindervideos/gapcast-1-health-money-sex-in-sweden/

Data scources: https://data.worldbank.org/ https://www.census.gov/ https://www.eia.gov/ https://www.cia.gov/index.html